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Abstract 

Recently Johansson [19]) and Johnstone ([21]) proved that the dis- 
tribution of the (properly rescaled) largest principal component of the 
complex (real) Wishart matrix X*X (X t X) converges to the Tracy- 
Widom law as n, p (the dimensions of X) tend to oo in some ratio 
n/p — > 7 > 0. We extend these results in two directions. First of all, 
we prove that the joint distribution of the first, second, third, etc. 
eigenvalues of a Wishart matrix converges (after a proper rescaling) 
to the Tracy- Widom distribution. Second of all, we explain how the 
combinatorial machinery developed for Wigner random matrices in 
[28]- [30] allows to extend the results by Johansson and Johnstone to 
the case of X with non-Gaussian entries, provided n — p = 0(p 1 ^ 3 ). 
We also prove that \ max < (n 1//2 +P 1 / 2 ) 2 + 0(p 1 ^ 2 \og(p)) (a.e.) for 
general 7 > 0. 



1 Introduction 

Sample covariance matrices were introduced by statisticians about seventy 
years ago ([25], [39]). There is a large literature on the subject (see e.g. 
[2]-[6], [9], [12]-[18], [21], [23], [36]-[37]). We start with the real case. 
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1.1 Real Sample Covariance Matrices 

The ensemble consists of p-dimensional random matrices A p = X l X (X 1 
denotes a transpose matrix), where X is an n x p matrix with independent 
real random entries Xij, 1 < i < n, 1 < j < p such that 



To prove the results of Theorems 2, 3 below we will need some additional 
assumptions: 

(ii) The random variables x^, 1 < i < n, 1 < j < p, have symmetric laws of 
distribution. 

(hi) All moments of these random variables are finite; in particular (ii) implies 
that all odd moments vanish. 

(iv) The distributions of x^, decay at infinity at least as fast as a Gaussian 
distribution, namely 



Here and below we denote by const various positive real numbers that do not 
depend on n, p, i, j. 

Complex sample covariance matrices are defined in a similar way. 

1.2 Complex Sample Covariance Matrices 

The ensemble consists of p-dimensional random matrices A p = X*X (X* 
denotes a complex conjugate matrix), where X is an n x p matrix with 
independent complex random entries x^, 1 < i < n, 1 < j ' < p , such that 



E( Xij ) 2 = 1, 



Exij = 0, 



(1.1) 
(1.2) 



1 < i < n, 1 < j < p. 







(1.4) 




1 < i < n, 1 < j < p. 



2 



The additional assumptions in the complex case mirror those from the 
real case: 

(ii') The random variables Kexij, Imxij, 1 < % < n, 1 < j < p, have 
symmetric laws of distribution. 

(iii') All moments of these random variables are finite; in particular (ii') 
implies that all odd moments vanish. 

(iv') The distributions of ReXij, Imxij decay at infinity at least as fast as a 
Gaussian distribution, namely 

E|xy| 2m < (constm) m . (1.7) 



Remark 1 The archetypical examples of sample covariance matrices is a p 
variate Wishart distribution on n degrees of freedom with identity covariance. 
It corresponds to 

Xij ~ N(0, 1), 1 < i < n, 1 < j < p, (1.8) 

in the real case, and to 

Rex i: j, Imxij ~ N(0, 1), 1 < i < n, 1 < j < p, (1.9) 

in the complex case. 

It was proved in [23], [17], [37] that if (i) ((i') in the real case) is satisfied, 

n/p — > 7 > 1, as p — > oo, and E|xjj| 2+5 < const (1-10) 

then the empirical distribution function of the eigenvalues of A p /n converges 
to a non-random limit 

G Ap/n (x) = ±#{\% ) <x, fc = l,... ,n}-G(x) (a.s). (1.11) 

where 

are the eigenvalues (all real) of A p /n, and G(x) is defined by its density g(x) : 

g ( x ) = f2LV( b - x )( x - a )> a<x<b, 
] 0, otherwise, 
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a = (1 - 7" 1/2 ) 2 , 6=(l + 7 - 1/2 ) 2 - 

Since the spectrum of XX* differs from the spectrum of X*X only by 
(n—p) null eigenvalues, the limiting spectral distribution in the case < 7 < 
1 remains the same, except for an atom of mass 1 — 7 at the origin. From 
now on we will always assume that p < n, however our results remain 
valid for p > n as well. 

The distribution of the largest eigenvalues attracts a special attention 
(see e.g. [21], section 1.2). It was shown by Geman ([15]) in the i.i.d. case 
that if E|xij | 6+l5 < 00 the largest eigenvalue of A p /n converges to (1 + 7 -1 ^ 2 ) 2 
almost surely. A few years later Yin, Bai, Krishnaiah and Silverstein ([36], 
[3]) showed (in the i.i.d. case) that the finiteness of the fourth moment is a 
necessary and sufficient condition for the almost sure convergence (see also 
[27]). These results state that \ m ax(A p ) = (n 1 / 2 +P 1 / 2 ) 2 +o(n+p) . However no 
results were known about the rate of the convergence until recently Johansson 
([19]) and Johnstone ([21]) proved the following theorem in the Gaussian (real 
and complex) cases. 

Theorem Suppose that a matrix A p = X l X (A p = X*X) has a real (com- 
plex) Wishart distribution (defined in Remark 1 above) and n/p — > 7 > 0. 
Then 

'^max 

0~n,p 

where 

Hn, P =(n l/2 + p 1/2 f, (1.12) 
v n , P = (n^+p^)(n-^ + p-^ (1.13) 

converges in distribution to the Tracy-Widom law ( F 1 in the real case, F 2 in 
the complex case). 

Remark 2 Tracy-Widom distribution was discovered by Tracy and Widom 
in [33], [34]. They found that the limiting distribution of the (properly 
rescaled) largest eigenvalue of a Gaussian symmetric (Gaussian Hermitian) 
matrix is given by Fi(F 2 ), where 

1 f°° 

F 1 (x) = exp{-- J q(t) + (x - t)q 2 (t)dt}, (1.14) 

PCX) 

F 2 (x)=exp{-/ (x - t)q 2 (t)dt}, (1.15) 

J X 
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and q(x) is such that it solves the Painleve II differential equation 

d 2 q{x)/dx 2 = xq(x) + 2q 3 {x) (1.16) 
q(x) ~ Ai(x) as x — > +00 (1-17) 

where Ai(x) is the Airy function. Tracy and Widom also derived the ex- 
pressions for the limiting distribution of the second largest, third largest, etc 
eigenvalues as well. Since their discovery the field has exploded with a num- 
ber of fascinating papers with applications to combinatorics, representation 
theory, probability, statistics, mathematical physics, in which Tracy- Widom 
law appears as a limiting distribution ( for recent surveys we refer the reader 
to [1], [10]). 

Remark 3 It should be noted that Johansson studied the complex case and 
Johnstone did the real case. Johnstone also gave an alternative proof in the 
complex case. We also note that Johnstone has n — 1 instead of n in the 
center and scaling constants n n ,p, &n,p in the real case. While this change 
clearly does not affect the limiting distribution of the largest eigenvalues , 
the choice of n — 1 is more natural if one uses in the proof the asymptotics 
of Laguerre polynomials. 

Remark 4 On a physical level of rigor the results similar to those from 
the Johansson- Johnstone Theorem (in the complex case) were derived by 
Forrester in [13]. 

While it was not specifically pointed there, the results obtained in [21] 
imply that the joint distribution of the first, second, third , . . . , /c-th, k = 
1, 2, . . . largest eigenvalues converges (after the rescaling (1.12), (1.13)) to the 
limiting distribution derived by Tracy- Widom in [33] and [34]. In the complex 
case one can think about the limiting distribution as the distribution of the 
first k (from the right) particles in the determinantal random point field with 
the correlation kernel given by the Airy kernel (2.8). We remind the reader 
that a random point field is called determinantal with a correlation kernel 
S(x, y) if its correlation functions are given by 

p k (xi, . . . ,x k ) = det S(xi,Xj), A; = 1,2,... (1-18) 

1 < i , j < A; 

(for more information on determinantal random point field we refer the reader 
to [31]). In the real case the situation is slightly more complicated (correlation 
functions are given by the square roots of determinants, see Section 2, Lemma 
1 and Remark 6). We claim the following result to be true: 
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Theorem 1 The joint distribution of the first, second, third, etc largest 
eigevalues (rescaled as in (1.12), (1.13) ) of a real (complex) Wishart matrix 
converges to the distribution given by the Tracy-Widom law (i.e. the limiting 
distribution of the first, second, etc rescaled eigenvalues for GOE (0 = 1, 
real case) or GUE ( f3 = 2, complex case) correspondingly) . 

Theorem 1 is proved in Section 2. Our next result generalizes Theorem 
1 to the non-Gaussian case, provided n—p = 0(n l l z ) . 

Theorem 2 Let a real (complex) sample covariance matrix satisfy the condi- 
tions (i — iv) ((i' — iv')) and n — p = 0(p 1 ^ 3 ) . Then the joint distribution of 
the first, second, third, etc largest eigenvalues (rescaled as in (1.12), (1.13)) 
converge to the Tracy-Widom law with (5 — 1 (2) . 

Similar result for Wigner random matrices was proven in [30]. For other 
results on universality in random matrices we refer the reader to [26], [11], 
[7], [20], [8], [22]. 

While we expect the result of Theorem 2 to be true whenever n/p — > 
7 > , we do not know at this moment how to extend our technique to the 
case of general 7 . In this paper we settle for a weaker result. 
Theorem 3 Let a real (complex) sample covariance matrix satisfy (i) — 
(iv) ((f) — (iv')) and n/p — > 7 > . Then 

a) E Traced = ^^ (l + o(l)) if m = o(^). 

b) E Traced™ = 0(^%) if m = 0(^). 

As a corollary of Theorem 3 we have 
Corollary 1 

^max(Ap) < fi n , p + 0(p 1/2 log(p)) (a.e.). 



We prove Theorem 1 in Section 2, Theorem 2 in Section 3 and Theorem 
3 and Corollary 1 in Section 4. 

The author would like to thank Craig Tracy for useful conversations. 
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2 Wishart Distribution 

The analysis in the Gaussian cases is simplified a great deal by the exact 
formulas for the joint distribution of the eigenvalues and the fc-point corre- 
lation functions, k — 1, 2, .... In the complex case the density of the joint 
distribution of the eigenvalues is given by ([18]): 

p 

P p (xi,. . . ,x p ) = c~p Yl ( x i- x j) 2 Yl x j Pex P(- x j)> a P = n-p, (2.1) 

i<«<i<p j=i 

where c„ iP is a normalization constant. Using a standard argument from 
Random Matrix Theory ([24] ) one can rewrite P p (xi, . . . ,x p ) as 

i det S p (xi, Xj ) (2.2) 
pi i<*j<p 



where 

p-i 



S p (x,y) = J2^j P \ x W p) (y) (2-3) 

3=0 

is the reproducing (Christoffel-Darboux) kernel of the Laguerre orthonormal- 
ized system 



tpf>\x) = ^j-^—^x^ 2 exp(-x/2)L^(x), (2.4) 

and L^ p are the Laguerre polynomials ([32]). This allows one to write the 
/c-point correlation functions as 

pjf^xi, . . . ,x k ) = det S p (xi,Xj), k = l,2,...,p (2.5) 

(for more information on correlation functions we refer the reader to [24], 
[35], [36]). As a by-product of the results in [21] Johnstone showed that after 
the rescaling 

X = Hn,p + Vn,pS (2.6) 

the (rescaled) kernel 
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converges to the Airy kernel 

s l — s 2 Jo 

(2.8) 

The convergence is pointwise and also in the trace norm on any (t, oo), t G 

In the real Wishart case the formula for the joint distribution of the 
eigenvalues was independently discovered by several groups of statisticians 
at the end of thirties (see [25], [39]): 

v 

P p (xi, ... , x p ) — const"], f|| \xi - Xj\ Y\_ x< j p/2 ex P( _x ?7 2 )> a p — n — 1 — p. 

i<*<i<p j=i 

(2.9) 

(note that in the real case a p = n — 1 — p, while in the complex case it was 
n — p. ) The fc-point correlation function has a form similar to (2.2), (2.3) 
however it is now equal to a square root of the determinant, and K p (x,y) is 
a 2 x 2 matrix kernel (see e.g. [38], [21]): 

p^\x u . . . ,x k ) = ( det K p (xi,Xj) ) 1/2 , k = l,...,p, (2.10) 

l<i,j<k 

where (in the even p case) 

K^\x,y) = S p (x,y)+^(x)(e<f>)(y) (2.11) 

K^ 2 \x,y) = {SpD){x,y)-^{x)<t>{y) (2.12) 

K^\x,y) = (eS p )(x,y) - e(x - y) + {e^){x){e(t>){y) (2.13) 

K<W\x,y) = Kl 1 > 1 \y,x), (2.14) 

operator e denotes convolution with the kernel 

e(x-y) = ^sign(x-y), (SD)(x, y) = ~— |j^> 
and ip(x), (f>(x) are defined as follows 

m = (-i) p (p(p ^ p))1/4 (v ^ p(:g) _ y/p+°&-&) ( 2 - 16 ) 

Ux) = vt P \x)/x. (2.17) 
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Remark 5 The formulas for K p (x, y) in the odd p case are slightly different. 
However since we are interested in the asymptotic behavior of the largest 
eigenvalues it is enough to consider only even p case. Indeed, one can carry 
very similar calculations in the odd p case and obtain the same limiting 
kernel K(x,y) as we got in Lemma 1. Or one may observe that the 
limiting distribution of the largest (rescaled) eigenvalues must be the same 
in the even p and odd p cases as implied by the following argument. Consider 
an (n + p) x (n + p) real symmetric (self-adjoint) matrix B = (bij), 1 < 
i, j < n + p, 

{Xij- n , if 1 < % < n, n + l<j<n + p 
Xj,i- P , ifp+l<i<n + p, I < j <p 
0, otherwise, 

Then the non-zero eigenvalues of B 2 and X*X coincide. If we now consider 
a matrix X obtained by deleting the first row and the last column of X 
and construct the corresponding matrix B , then by the mini-max principle 
we have Xk{B) > Xk{B), k = 1, 2, . . . . Repeating this procedure once more 
we see that the k-th eigenvalue of X*X for odd p is sandwiched between 
the k-th eigenvalues for p + 1 and p — 1 . 

The machinery developed in [21] allows us to obtain the following result 
about the pointwise convergence of the entries of K p (x,y). 

Lemma 1 

i r S2 

a) a^pK^'V (finj, + <7 niP si, /i„, p + (T n ,pS 2 ) -> S(s 1 , s 2 ) + -^(si) / Ai(t)dt, 

(2.18) 

1 f si 

+ <J n ,pSl, Hn,p + Vn,pS2) -> S(s 2 , S ± ) + -Ai(s 2 ) / Ai(t)dt, 

^ J~oo 

(2.19) 

1 d 

a2 n,p K p ,2) (.Vn,p + Vn,pSl,Hn,p + ^n,pS 2 ) -> ~ -Al^) Al(,S 2 ) - S 2) , 

(2.20) 

/■+oo r+oo 

Kp^iUnv + Vn,pSl,Hn,p + ^n,pS 2 ) ~> ~ du ( Ai(v)dv) Ai(s 2 + u) 

JO J si+u 
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-e(x-y) +- Ai(u)du +- Ai(u)du / Ai(v)dv. (2.21) 

J S2 J 81 J — OO 

b) Convergence in (2.18)-(2.21) is uniform on [s^+oo) x [s 2 ,+oo) as 
p — > oo for any §i > — oo, s 2 > — oo . It is also true that the error terms 
are O(e~ const ^ si+S2 ^) uniformly in p with some const > 0. 

Remark 6 

Lemma 1 implies the convergence of the rescaled /c-point correlation func- 
tions a^ p p^\xi, . . . ,x k ), %i = Hn, P + o- n , P Si, i = l,...,k, fc = l,2, ... 
to 

p k (s u . . . ,s k ) = ( det K(si,Sj), ) 1/2 , 

l<ij'<fc 

where the entries of K(s,t) = [K^s, t)^ 2 are given by the r.h.s. of 
(2.18)-(2.21). The limiting correlation functions coincide with the limiting 
correlation functions at the edge of the spectrum in the Gaussian Orthogonal 
Ensemble (see e.g. [14]) (it also should be noted that the formulas (1.15)- 
(1.16) we gave in [30] for K(s,t) must be replaced by (2.18)-(2.21)). 



Proof of Lemma 1 

The proof is a consequence of (2.11)-(2.14), (1.12)-(1.13) and the asymp- 
totic formulas for the Laguerre polynomials L^ p (x) ,a p — ■> oo, j ~ a p , 
near the turning point derived in [21]. Below we prove (2.18) and (2.21). 
(2.19) immediately follows from (2.14) and (2.18). (2.20) is established in a 
similar way to (2.18), (2.21). To prove (2.18) we employ a very useful integral 
representation for S p (x,y) ([38]): 

r+oo 

S p {x,y)= / <f)(x + z)4>(y + z)+4>(x + z)(f)(y + z)dz, (2.22) 
Jo 

where <f>(x), ip(x) are defined in (2.15)-(2.17). 

The asymptotic behavior of (f>(x), ip(x) was studied by Johnstone 
([21]) who proved 

0- n ,p(p(Pn,p + Or rltP s), (Tn,p'4>(ll>n,p + 0- n ,pS) -> -^=Ai(s) (2.23) 

and that the l.h.s. at (2.24) is exponentially small for large si, s 2 (uniformly 
in p.) While Johnstone stated only pointwise convergence in (2.22) his results 
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(see (3.7), (5.1), (5.19), (5.18), (5.22)-(5.24) and (6.11) from [21]) actually 
imply that the convergence is uniform on any [s, +oo). This together with 
(2.22) gives us 

&n, p Sp(fJ>n,p + &n,pSl, Hn,p + 0"n,pS 2 ) — > S(s 1 , S 2 ), (2.24) 

where the convergence is uniform on any [§i, oo) x [s 2 , oo) . To deal with the 
second term at the r.h.s. of (2.11), 

^ poo poo 

^)(e0)(y)=#r)(-y o ^u)du-J (f>{u)du), (2.25) 



we use 



1 f°° 1 

- J <j>(u)du -> (2.26) 

(see [21], Appendix A7). (2.23), (2.25)-(2.26) imply 

1 T 2 

o"n,p^(/in, P + o-n, p si)(e0)(/i„,p + o- niP s 2 ) -> -Ai(si) y Ai(t)dt. (2.27) 

This proves (2.18). 

To establish (2.21) we consider separately (eS p )(x,y) and 
(e^)(x)(e(j))(y). We have 

j r+oo r+oo r+oo 

eS p (x,y) — (- / du — duj / 0(w + z)ip(y + z) + ^(tt + + z)d 

2 y y 



+ 





(2.28) 


p-\-oo r+oo 

-J (J 0(u)du^(y + z))dz 


(2.29) 


r+oo r+oo 

/ (/ (f)(u)duip(y + z))dz 

JO Jx+z 


(2.30) 


r+oo r+oo 

-J (y ip(u)du(/)(y + z))dz 


(2.31) 


r+oo r+oo 

/ (/ tp(u)du(p(y + zf)dz 

JO Jx+z 


(2.32) 



Let us fix si, s 2 and consider 

£ = fJ>n,p + <Tn,pSl, y = Hn,p + Cr niP S 2 . (2.33) 
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It follows from (2.23) that the integrals (2.30) and (2.32) converge to 
— / du( Ai(v)dv)Ai(s 2 + u). 

2 Jo Jsi+u 



'si- 
Let us now write (2.29) as 

" + OO f + OO 

2 / 



-1 p-j-oc p-\-oc 

-J (J 4>{u)duil>(y + z))dz (2.34) 

y r+oo rz 

~2j ( J ^ du ^y + z ")) dz ( 2 - 35 ) 

y r+oo 

= 7lL ^ y + z)dz (2 - 36) 

i r+oo rz 

~2 1 V < f>( u ) du ^(y + z )) dz ( 2 - 37 ) 

Using (2.23) one can see that (2.36) converges to \ Ai(u)du. 

The integral (2.37) tends to zero as p — > oo . Indeed, suppose that n — p^ 
+oo (the case n—p = 0(1) can be treated by using the classical asymptotic 

formulas for Laguerre polynomials for fixed a (see e.g. [32])). Let us write 
Jo + °° Ho <t>{ u )duil)(y + zj)dz as 



nz roo nz 

I 4>(u)dui)(y + z))dz+ / (/ (j)(u)dui/j(y + z))dz. (2.38) 

J^/p Jo 

Similar calculations to the ones from Appendix 7 of [21] show that for 

z < Vp 

[ (j)(u)du = 0((const p)- (n - p)/4 ), where const > 0. 
Jo 

This estimate coupled with the following (rather rough) bounds 



rvp r°° 
I \i/>(y + z)\dz< p l/ \ / ^{zfdz) 1 ' 2 < 

Jy 

POO POO 

COnst P V4(( / ^(^2^1/2 + ( / ^^^2^)1/2) = (pl/4) 

•Jy Jy 
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take care of the first term in (2.38). If z > yfp one has 

\ij}(y+z)\ = \^(Hn,p+cr n , P ( s 2+z/cr^ p ))\ < exp(-const(s 2 +z/p 1/3 )), const > 0, 

where we have used the exponential decay of ip(p n ,p + o~ ntP s) for large s 
(see (2.23), (2.15)-(2.17) and [21], formula (5.1) ). Since 

| / (j){u)du\ < y/z( / 4>(u) 2 du) 1 ^ 2 < const p\fz, 
Jo Jo 

we conclude that (2.37) is o(l). Using vjj(u)du = one can prove in 

a similar fashion that (2.31) is also o(l). To establish (2.21) we are left with 
estimating 

(eip)(x)(e(f))(y) = {-J ip(u)du- J ip(u)du)(- J <f>(u)du- J <f>(v)dv) 

poo -i poo 

= (-J^ ^(«)d«)(-^ + (l)-jf <j>(v)dv). 

Using (2.23) and (2.33) we derive that the last expression converges to 

— | f s °° Ai(u)du + \ f^°° Ai(u)du f* 2 Ai(v)dv. This finishes the proof of 

(2.21 s ) 1 . To obtain (2.20) we use (2.23^ and 

^n>'(/Vp + 0-n, P s), al tP il>'(ii ntP + <j niP s) -> -^Ai'(s). (2.39) 

which follows from the machinery developed in [21]. Lemma 1 is proven. 

Theorem 1 now follows from 

Lemma 2 Suppose that we are given random point fields F, F n , n = 
1,2,... with the k-point correlation functions p k (xi, . . . , x^), p^\xi, . . . , x^) k 
1,2,... such that the number of particles in (a, oo) (denoted by #(a, oo) ) 
is finite F—a.e. for any a > — oo and F is uniquely determined by its 
correlation functions. Then the following diagram holds: 

d) =>- c) =>- b) •<=>- a), 

tt>/jere 

a) The joint distribution of the first, second, . . . , k—th rightmost parti- 
cles in F n converges to the joint distribution of the first, second, . . . , k—th 
rightmost particles in F for any k > 1 . 
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b) The joint distribution of #(ai,&i),... , #(aj,6j), / > 1 m F n con- 
ferees to toe corresponding distribution in F /or any collection of disjoint 
intervals (a±, bi), . . . , (ai, bi), a 3 - > — oo, bj < +00, j — 1, . . . , I, 1 = 1,.... 

c) pk(xi, . . . , Xk) is integrable on [t, oo) h for any t G R 1 , k — 1, 2, . . . 
and 

/ P k n \ x ii ■ ■ ■ ,Xh)dx! . . .dx k -»• (2.40) 

J(ai,fei) fc lx...x(a i ,b ; ) fc; 

/ Pfe(^i, • • • , x k )dxi ...dx k (2.41) 

J(ai,fei) fc lx...x(a i ,6 ; ) fc ! 

/or any disjoint intervals (ai,&i),... ,(a/,6/), > — 00, bj < +00, j = 
1, . . . , /, I = 1, . . . , k, ki + . . . + ki = k, k = 1, 2, . . . . 
dj For an?/ /c > 1 toe Laplace transform 

/ exp( ^ tjX^pkix!, ... , x^dxx ...dx k 
J j=i,...,k 

is finite for t x G [cf\ df ] ] . . . , t k G [cjjf , df } ], where cf < df\ df > 
0, j — 1, . . . , k, and 

/ exp( ^ t j Xj)p^ l \x 1 , . . . ,x k )dxi . . .dx k -> (2.42) 
^ j=i,...,fc 

/ exp( ^ tjXj)p k (x u ... , x k )dxi ...dx k (2.43) 
^ j=i,...,fc 

/or suc/i ti, . . . ,t k as n^oo. 

Proof of Lemma 2 d) => c) 

Suppose that d) holds. Fix some positive ti G (c[ k \ df^), . . . t k G (c^, dj^) • 
Denote by H n (dxi, . . . , dx^), H(dxi, . . . , da^), the probability measures on 
with the densities 

h n (x!,... ,x k ) = ^ 1 exp( ijXj)Pik\xi,... ,x k ), 

j=l,..., k 

h(x u ... ,x k ) = Z _1 exp( Y tj x j)Pk(xi, ■ ■ ■ ,x k ), 

3=1,..., k 
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where Z n , Z are the normalization constants (it is easy to see that Z n — > 
Z. ). The constructed sequence of probability measures is tight (by Helly 
theorem), moreover their distributions decay (at least) exponentially for large 
(positive and negative) x±, . . . ,x k uniformly in n. It follows from the 
tightness of {H n } that all we have to show is that any limiting point of H n 
coincides with H. Suppose that a subsequence of H n weakly converges to 
H. Then H must have a finite Laplace transform for — t\ < Re t\ < 

d[ k ^ — t±, . . . , cj^ — t k < Re tk < dtjf* — t k and the Laplace transforms 
of H n must converge to the Laplace transforms of H in this strip. Since 
the Laplace transforms of H, H are analytic in the strip and coincide for 
ti G [c[ k \d[ k ^] ... ,t k G [c£\d^] they must coincide in the whole strip. 
Applying the inverse Laplace transform we obtain that H coincides with H. 
It follows then that 



/ p£\xi,... , x k )dxi . . . dx k 

J (ai,bi)x...x(a fc ,6 fc ) 



/ Pk(xi, ■ ■ ■ ,x k )dxi ...dx k 

J (ai,bi)x...x(a k ,b k ) 

for any finite Oj < bj, j — 1,... ,k, and the exponential decay of 
p[ n \xi, . . . ,x k ), p k {x u . . . ,x k ), for large positive x u . . . ,x k , implies 
that this still holds for bj = +00, j = 1, . . . , k. 
c) =► b) 

We remind the reader that the integral in (2.40) is equal to the (k±, . . . , k\)- 
th factorial moment 

E tt 

of the numbers of particles in the disjoint intervals (ai,6i),... ,(ai,bi). 
Since the probability distribution of the random point field F is uniquely de- 
termined by its correlation functions, the joint distribution of the numbers of 
particles in the boxes is uniquely determined by the moments, and therefore 
the convergence of moments implies the convergence of the distributions of 
#(ai,6i),... ,#(aj,6j). 
b) <=► a) 
Trivial. Observe that 

P(Ai < Si, A 2 < s 2 , . . . , A fc < s fc ) = 
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P(#( Sl , +00) = 0, #(s 2 , +00) < 1, ... , #(s fc , +00) < fc - 1) . 
Lemma 2 is proven. 

Proof of Theorem 1 It is worth noting that the limiting random point 
fields defined in (1.18), (2.8) (complex case) and in Remark 6 (real case) 
are uniquely determined by their correlation functions (see e.g. [31]). To 
prove Theorem 1 in the complex case we use a general fact for the ensembles 
with determinantal correlation functions that the generating function of the 
numbers of particles in the boxes is given by the Fredholm determinant 

E TJ zf aiM =dSt(Id+ ('i-WpXtoM (2-44) 

j=l,..., I j=l,...,k 

(see e.g. [35], [31]), where X{a,b) is the operator of the multiplication by the 
indicator of (a, b) . Trace class convergence of S p to K on any (a, 00), a > 
—00, implies the convergence of the Fredholm determinants, which together 
with Lemma 2 proves Theorem 1 in the complex case. To prove Theorem 
1 in the real case we observe that Lemma 1 implies that after rescaling 
Xi = Hn, P + Vn, P Si, i — 1, 2, . . . condition (2.40)-(2.41) of Lemma 2, part c) 
is satisfied. Theorem 1 is proven. 



3 Proof of Theorem 2 

The proof of Theorem 2 heavily relies on the results obtained in [28]- [30]. 
We start with 

Lemma 3 Let A p be either a real sample covariance matrix (i) — (iv) or 
complex sample covariance matrix ((f) — (iv')) and n — p = 0(p 1 ^ 3 ) as 
p — > oo. Then there exists some const > such that for any ti, . . . % > 
and 

m p 1] = [ti-p*],... ,H fc) = [tk-ph 

the following estimate holds: 
a) 



mi, 



EfJ Trace tff < const k \\ ^exp(const ^t\) (3.1) 

i=l i=l ti i=l 
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b) If A p , A p belong to two different ensembles of random real (complex) 
sample covariance matrices satisfying (i)— (iv) ((f) — (iv')), and n—p = 
0(p^ 3 ), then 



E Trace A™^ -Ej] Trace A^ (3.2) 



tends to zero as p — > oo. 

Proof of Lemma 3 

Lemma 3 is the analogue of Theorem 3 in [30] and is proved in the same 
way. Since the real and the complex cases are very similar, we will consider 
here only the real case. As we explained earlier, we can assume without loss 
of generality that p < n . Our arguments will be the most transparent 
when k — 1 and the matrix entries {;%}, 1 < i < n, 1 < j < p are 
identically distributed. Construct a n x n random real symmetric Wigner 
matrix M n = (yij), 1 < i,j < n such that y^ = y^, % < j are independent 
identically distributed random variables with the same distribution as in . 
Then 

ETrace Ap> < ETrace M^ mp . (3.3) 

To see this we consider separately the left and the right hand sides of the 
inequality. We start by calculating the mathematical expectation of 
Trace A^ . Clearly, 

E Trace A p p = ^ ] E x^^x^^x^^x^^ . . . Xi 2mp _ lt i 2mp _ 2 Xi 2rnp _ l io . (3.4) 
v 

The sum in (3.4) is taken over all closed paths V = {io,i±, . . . ,12m -i^o} ? 
with a distinguished origin, in the set {1, 2, ... n} with the condition 

CI. i t E {1,2,... p} for odd t 

satisfied. We consider the set of vertices {1, 2, ... n} as a nonoriented graph 
in which any two vertices are joined by an unordered edge. Since the dis- 
tributions of the random variables Xij are symmetric, we conclude that if a 
path V gives a nonzero contribution to (3.4) then the following condition 
C2 also must hold : 

C2. The number of occurrences of each edge is even. 
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Indeed, due to the independence of {:%}, the mathematical expectation of 
the product factorizes as a product of mathematical expectations of random 
variables corresponding to different edges of the path. Therefore if some edge 
appears in V odd number of times at least one factor in the product will be 
zero. Condition C2 is a necessary but not sufficient condition on V to 
give a non-zero contribution in (3.4). To obtain a necessary and sufficient 
condition let us note that an edge = j, i^+i — g, k = 0, . . . , 2m p — 1, 
contributes Xj g for odd k and x g j for even k. Clearly the number of 
apperances in each non-zero term of (3.4) must be even both for Xj g and 
Xgj . This leads to 

C3. For any edge {j,g}, j, g € {1, 2, . . . , n}, the number of times we pass 

{j, g} in the direction j — > g at odd moments of time 2/c+l, k — 0, 1, . . . , m p , 
plus the number of times we pass {j, g} in the direction g — > j at even 
moments of time 2k, k = 0, 1, . . . must be even. 
Let us now consider the r.h.s. of (3.3). We can write 

E Trace M 2rUp = ^ E y khil y h , i2 yi 2 , h y hM . . . J/i 2mi ,- 2) i 2mj> -iS/i 2mj( -i, j0 , (3-5) 
v 

where the sum again is over all closed paths V = {io, H, ■ ■ ■ , «2m p -i,i }, with a 
distinguished origin, in the set {1, 2, ... n). Since M n is a square nx n real 
symmetric matrix conditions CI and C3 are no longer needed. In particular 
the necessary and sufficient condition on a path V to give a non-zero 
contribution to (3.5) is C2. It does not matter in which direction we pass an 
edge {jg} , because both steps j — > g and g — > j give us t/j g = y g j . Using 
the inequalities E xf^x 2 ^ < 'Ey 2r g +2q we show that each term in (3.4) is 
not greater than the corresponding term in (3.5) and, therefore, obtain (3.3). 
(3.1) (in the case k — 1 ) then immediately follows from Theorem 3 of [30] 
(the matrix A n considered there differs from M n by a factor ) . In the 
general case the proof of (3.1)-(3.2) is essentially identical to the one given 
in [30]. In particular, part b) of Lemma 3 follows from the fact that the l.h.s. 
at (3.2) is given by a subsum over paths that, in addition to C1-C3 have 
at least one edge appeared four times or more. As we showed in [30] the 
contribution of such paths tends to zero as n — > oo . Lemma 3 is proven. 

Remark 7 
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If the condition n — p = 0(p 1 ^) in Lemma 3 and Theorem 2 is not 
satisfied the machinery from [28] -[30] does not work, essentially for the fol- 
lowing reason: when we decide which vertice to choose during the moment 
of self-intersection (as explained in section 4 of [29]) the number of choices 
for odd moments of time is smaller because of the constrain CI. If we now 
use the same bound as for the even moments of time (the one similar to 
the bound at the bottom of p. 725 of [29]) the estimate becomes rough when 
n — p is much greater then p 1 ^ 3 . Therefore new combinatorial ideas are 
needed. 

As corollaries of Lemma 3 we obtain 
Corollary 2 There exist const > such that for any s = o(p 1 ^ 3 ) 

P(Ai(Ap) > fi n , P + o~n tP s) < const exp(-consts) 



Corollary 3 

/ exp( ^ t j s j )p k p) {s 1 ,...,s k )ds 1 ...ds k -> (3.6) 

J (-oo,n n} p+a„ tP p 1 / 6 ] k j=l k 

/ exp( V" tjSj)pk(si,. . . ,s k )dsi. . .ds k (3.7) 
jRk i=i,...,fc 

for any t± > 0, ...,£&> as n — > oo ; where 

• • • ' Sfc ) = (°~™>p) Pk \Pn,p + °n,pSl ; • • • , pn,p + 0"n,p s fc) 

is the rescaled k-point correlation function and p k (s 1 , . . . , s k ) is defined in 
Section 2, Remark 6 by the r.h.s. of (2.18)-(2.21). 

To prove Corollary 2 we use the Chebyshev inequality 



EX^ApY^ ETraceA 



P(Xi(A p ) > p n:P + a n ,ps) < ■ < 



v 



and Lemma 3. As a result of Corollary 2 we obtain that with probability 
0(exp(— constp 1 ^)) the largest eigenvalue is not greater than Pn^+Cn^P 1 ^ 6 ■ 
Therefore, it is enough to study only the eigenvalues in (— oo, /i„ iP + cr„ iP x 
p 1 / 6 ] (with very high probability there are no eigenvalues outside). To prove 
Corollary 3 we first note that Lemma 3 implies 
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exp( ^2 tjSj)p^\si,... ,s k )dsi...ds k < (3.8) 



j=l,..., k 



nk ,3k/2 
i=l l i 



const k 



k 

exp(const • tf), 



(3.9) 



i=i 



with some const > 0. To see this we write 



* j=k 



* j=k 



E E II expfo^-zv,)/^) < E £ J] (^/^fe^-^ (l+o(l)) 



where the sum J^* is over all k-tuples of non-coinciding indices (ii, %2, ■ ■ ■ , ik), 
1 < ij < P, j = 1, • • • , k, such that A^. < p UjP + a niP p 1/6 , j = 1, . . . , fc, 
and apply Lemma 3, a). Part b) of Lemma 3 implies that the differences 
between left hand sides of (3.8) for different ensembles of random matrices 
(i)-(iv) ( (i')-(iv')) tend to 0. Finally we note that in the Gaussian case the 
l.h.s. of (3.8) converges, which in turn implies the convergence for arbitrary 
ensemble of sample covariance matrices. For the details we refer the reader 
to the analogous arguments in [30]. Corollary 3 is proven. Theorem 2 now 

follows from Lemma 2, part d) and Corollary 3. 

4 Proof of Theorem 3 

In order to estimate the r.h.s. of (3.4) we assume some familiarity of the 
reader with the combinatorial machinery developed in [28] -[30]. In particu- 
lar we refer the reader to [28] ( Section 2, Definitions 1-2) or [29] ( Section 
4, Definitions 1-4) how we defined a) marked and unmarked instants, b) a 
partition of all verices into the classes Afo,Afi, ■ ■ ■ ,M m and c) paths of the 
type (n ,ni, . . . ,n m ) , where n k = n, kn k = m (for simplicity 

we omit a subindex p in m p throughout this section). Let us first estimate 
a subsum of (3.4) over the paths of some fixed type (n ,ni, . . . ,n m ) . Es- 
sentially repeating the arguments from [28]- [29] we can bound it from above 



k 



<E J] Trace Ap" n '" /tTn ^ '.K p K,p(i + (i)) 



i 
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by 



( \\ . m 

n W...n m \U k=2 (k\)^l\ ^ 

where the sum J2 X en m * s over an possible trajectories 

X = {x(t) > 0, x(t+l)-x(t) = -1, +1, t = 0, . . . , 2m- 1, x(o) = x(2m) = 0} 

and #(X) = 4(t : x(* + 1) - x(t) = +1, t = 2k, k — 0, . . . ,m — 1) . 

The only differences between the estimates (4.1) in this paper and (4.4) 
and (4.27) in [29] are 

a) the number of ways we can choose the vertices from Mi is estimated from 
above by p ni {n / p)*^ / n\\ not by n(n — 1) • • • (n — n\ + 1) jri\\ , because 
of the restriction CI from the last section , 

b) we have in (4.1) the factor (const2) 2ri2 instead of 3 r in (4.27) of [29], 
which is perfectly fine since r < n 2 (by r we denoted in [29] the number 
of so-called "non-closed" verices from Af 2 ), and 

c) there is no factor in (4.1) because of the different normalization. 
Let us denote by g m (y) = Y.x^ m V* iX) (observe that g m (l) = \Q m \ = 

,? m ] ,s. are just Catalan numbers). 

m<(m+l)\ ■' ' 

Consider the generating function G(z, y) = Ylm=o 9m{y) zm , 9o(y) — 1 - It 
is not difficult to see (by representing g m (y) as a sum over the first instants 
of the return of the trajectory to the origin) that 

G(z,y) = 1 +yzG'(z,y)G(z,y) 

G'(z,y) = l + zG'(z,y)G(z,y), [ ' } 

where 

oo 

G'(z,y) = 9' m (y)z m , 9'miv) = E y*' iX) and ^ 

m=0 xen m \ ' 

= #(t : x{t + 1) - x(t) = +1, t = 2k + l, k = 0, . . . , m - I). 

Solving (4.2) we obtain 

-yz + z + 1- y/((y-l)z-iy-Az _-yz + z+l-(y- 1) y/(z - z x ){z - z 2 ) 



G(z,y) 



2z 2z 

(4.4) 
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where z\ = l/(y/y+ l) 2 , z 2 = l/( v /y — l) 2 , and we take the branch 
of a/ (-2 — zi)(z — z 2 ), analytic everywhere outside [z\, z 2 ], such that that 
v/(0-zi)(0-2 2 ) = l/(y - 1) • Therefore 



n ^ = -^1 / V(Z- ZI )(Z-Z 2 ) _ 

9miV) Am ^ |=2i _ £ ^ m+2 , m>l, (4.5) 

where the integration is counter-clockwise. An exercise in complex analysis 
gives us 

J\z\=Z!-e Z Z 1 m 6 ! 1 JO 

= (iFi) ^7^( 1 + °( 1 ))- 

(4.6) 

Therefore 

^ 2^ ^3 7 ^( 1 + °( 1 ))' ( 4 - 7 ) 

and the subsum of (3.4) over the paths of the type (ra ,rai, . . . ,n m ) is 
bounded from above by 



(n/p) 1/4 ( v / n/p+ 1) +1 (w-wQ! m! 



2v^F " nolnil.-.r^irKL^A;!)"* 

n(const^ (v/ ^3/t ir a + ^))< 



rra 

fc=2 



ra 



/ P ) 1/4 (y^+l) P/C, _J_ (!LZ!^ ^ ! fTfconst^- (1 + o(D) 

20F m 3 /2 no!ni! . . . nJ nr=#o nfc tv 

(4.8) 

(the constant const may have changed). Using the inequality rra! < raj x 



m m-m 



and J^fcLi ^ n fc = m ; SfcLi = n — n we obtain 



f4g x < (»/p) 1/4 (>/^7p + 1) PjC n -E 2 fc ^ nE -n fc v^n, TT (const*)"* 



(ra/p) 1 / 4 ( v /ra7p + 1) g < p ,yr J_ , (const ro)* ^ 



(4.9) 
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Now we can estimate the sum of (4.9) over (n D , ni, . . . , n m ), 
< 12^=2 kn-k <m as 

V fc=2 

Since for m = o(p 1 ^ 2 ) 

^(£^)! = 0(m7n) = o(1) (411) 

fc=2 

we see that the subsum of (3.4) over V with 'Yl™ = 2 n k > is o( p I 7 }? ). Fi- 
nally we note that the subsum over the paths of the type (n—m, m, 0, 0, . . . , 0) 
is 

Wp)"W + i) P * (1 + „ (1))| (412) 



2v^F m 3 / 2 

because for such paths we can choose the vertices from M\ exactly in p m x 
(n/p)^ x \l + o(l)) different ways ( if m — o(p 1 ^ 2 ) ), and the first point of 
a path in p different ways. Combining (4.11) and (4.12) we prove the first 
part of Theorem 3. To prove part b) we observe that if m — 0(p 1 ^ 2 ), the 
l.h.s. of (4.11) is still 0(m 2 /n), which together with (4.10) and (4.12) 
finishes the proof. Theorem 3 is proven. 

To derive Corollary 1 from Theorem 3 we apply Chebyshev's inequality (sim- 
ilarly to the proof of Corollary 2 in Section 3) and Borel-Cantelli lemma. 
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